AITopics

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Industry: Information Technology > Services (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Neural Information Processing SystemsFeb-15-2026, 16:57:06 GMT

6ea56c0baacac9f7764257a43a93c90a-Paper-Datasets_and_Benchmarks_Track.pdf

large language model, machine learning, natural language, (17 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > Dominican Republic (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Consumer Products & Services (0.68)
Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Neural Information Processing SystemsFeb-12-2026, 17:56:02 GMT

eec7fee9a8595ca964b9a11562767345-Paper-Conference.pdf

secret image, singan, steganography, (14 more...)

Country:

Asia > China > Hong Kong (0.05)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Overview (0.46)

Industry: Information Technology (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Neural Information Processing SystemsFeb-10-2026, 15:15:12 GMT

ac796a52db3f16bbdb6557d3d89d1c5a-Paper.pdf

Internal learning for single-image generation is a framework where a generatoristrained toproduce novelimages based on asingle image. Since these modelsare trained on asingle image, theyare limited in their scale and application.

artificial intelligence, arxivpreprintarxiv, machine learning, (18 more...)

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Artificial IntelligenceNov-6-2025

QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models

Kao, Kuei-Chun, Tzu-Yin, Hsu, Hong, Yunqi, Wang, Ruochen, Hsieh, Cho-Jui

Recently, Multimodal Large Language Models (MLLMs) encounter two key issues in multi-image contexts: (1) a lack of fine-grained perception across disparate images, and (2) a diminished capability to effectively reason over and synthesize information from multiple visual inputs. However, while various prompting methods aim to describe visual content, many existing studies focus primarily on single-image settings or specific, constrained scenarios. This leaves a critical gap in understanding and addressing how MLLMs tackle more general and complex multi-image reasoning tasks. Thus, we first extensively investigate how current prompting methods perceive fine-grained visual details and process visual information when dealing with multiple images. Our findings reveal that existing prompting methods fall short in attending to needed clues and seamlessly integrating perception and reasoning. Inspired by the findings, we propose a new zero-shot prompting method, Question-Guided Chain-of-Captions (QG-CoC), a generalized prompting approach that effectively handles problems with an arbitrary number of images. We evaluate our method on various open-source and closed-source MLLMs for multi-image and single-image benchmarks. Experimental results indicate that QG-CoC demonstrates competitive performance across tasks and exhibits robust improvements in the challenging scenarios where existing prompting methods fail.

large language model, natural language, qg-coc, (16 more...)

2511.03206

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceOct-16-2025

AutoPR: Let's Automate Your Academic Promotion!

Chen, Qiguang, Yan, Zheng, Yang, Mingda, Qin, Libo, Yuan, Yixin, Li, Hanjing, Liu, Jinhao, Ji, Yiyan, Peng, Dengyun, Guan, Jiannan, Hu, Mengkang, Du, Yantao, Che, Wanxiang

As the volume of peer-reviewed research surges, scholars increasingly rely on social platforms for discovery, while authors invest considerable effort in promoting their work to ensure visibility and citations. To streamline this process and reduce the reliance on human effort, we introduce Automatic Promotion (AutoPR), a novel task that transforms research papers into accurate, engaging, and timely public content. To enable rigorous evaluation, we release PRBench, a multimodal benchmark that links 512 peer-reviewed articles to high-quality promotional posts, assessing systems along three axes: Fidelity (accuracy and tone), Engagement (audience targeting and appeal), and Alignment (timing and channel optimization). We also introduce PRAgent, a multi-agent framework that automates AutoPR in three stages: content extraction with multimodal preparation, collaborative synthesis for polished outputs, and platform-specific adaptation to optimize norms, tone, and tagging for maximum reach. When compared to direct LLM pipelines on PRBench, PRAgent demonstrates substantial improvements, including a 604% increase in total watch time, a 438% rise in likes, and at least a 2.9x boost in overall engagement. Ablation studies show that platform modeling and targeted promotion contribute the most to these gains. Our results position AutoPR as a tractable, measurable research problem and provide a roadmap for scalable, impactful automated scholarly communication.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2510.09558

Genre: Research Report > New Finding (0.34)

Industry: Media > News (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

arXiv.org Artificial IntelligenceOct-14-2025

VisRAG 2.0: Evidence-Guided Multi-Image Reasoning in Visual Retrieval-Augmented Generation

Sun, Yubo, Peng, Chunyi, Yan, Yukun, Yu, Shi, Liu, Zhenghao, Chen, Chi, Liu, Zhiyuan, Sun, Maosong

Visual retrieval-augmented generation (VRAG) augments vision-language models (VLMs) with external visual knowledge to ground reasoning and reduce hallucinations. Y et current VRAG systems often fail to reliably perceive and integrate evidence across multiple images, leading to weak grounding and erroneous conclusions. In this paper, we propose EVisRAG, an end-to-end framework that learns to reason with evidence-guided multi-image to address this issue. The model first observes retrieved images and records per-image evidence, then derives the final answer from the aggregated evidence. To train EVisRAG effectively, we introduce Reward-Scoped Group Relative Policy Optimization (RS-GRPO), which binds fine-grained rewards to scope-specific tokens to jointly optimize visual perception and reasoning abilities of VLMs. Experimental results on multiple visual question answering benchmarks demonstrate that EVisRAG delivers substantial end-to-end gains over backbone VLM with 27% improvements on average. Further analysis shows that, powered by RS-GRPO, EVisRAG improves answer accuracy by precisely perceiving and localizing question-relevant evidence across multiple images and deriving the final answer from that evidence, much like a real detective. All codes are available at https://github.com/OpenBMB/VisRAG. Retrieval-Augmented Generation (RAG) equips Large Language Models (LLMs) with a knowledge retriever that accesses a curated external knowledge base, supplying task-relevant context at generation time and mitigating hallucinations arising from insufficient parametric knowledge (Lewis et al., 2020; Asai et al., 2024). However, ineffective use of retrieved information limits practical adoption in domain-specific tasks.

large language model, machine learning, natural language, (18 more...)

2510.09733

Country:

North America > United States (0.28)
Asia > China (0.28)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Neural Information Processing SystemsOct-10-2025, 05:32:53 GMT

6ea56c0baacac9f7764257a43a93c90a-Paper-Datasets_and_Benchmarks_Track.pdf

arxiv preprint arxiv, benchmark, reasoning, (13 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > Dominican Republic (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Consumer Products & Services (0.68)
Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Popular ScienceOct-3-2025, 20:00:00 GMT

Google's latest AI photo-editing tool means you might not need Photoshop

Technology AI Google's latest AI photo-editing tool means you might not need Photoshop Gemini 2.5 Flash Image is a major image editing upgrade. Breakthroughs, discoveries, and DIY tips sent every weekday. We're now used to generative AI being able to create images from text prompts. The latest major upgrade to roll out in this category of AI is for Google's Gemini app. It's known as Nano Banana after the codename it had while still in testing--officially, it's called Gemini 2.5 Flash Image.

david nield, gemini, google, (11 more...)

Popular Science

Industry: Media > Photography (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)

arXiv.org Artificial IntelligenceSep-23-2025

RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios

Zhao, Fei, Lu, Chengqiang, Shen, Yufan, Wang, Qimeng, Qian, Yicheng, Zhang, Haoxin, Gao, Yan, Wu, Yi, Hu, Yao, Wu, Zhen, Xing, Shangyu, Dai, Xinyu

While various multimodal multi-image evaluation datasets have been emerged, but these datasets are primarily based on English, and there has yet to be a Chinese multi-image dataset. To fill this gap, we introduce RealBench, the first Chinese multimodal multi-image dataset, which contains 9393 samples and 69910 images. RealBench distinguishes itself by incorporating real user-generated content, ensuring high relevance to real-world applications. Additionally, the dataset covers a wide variety of scenes, image resolutions, and image structures, further increasing the difficulty of multi-image understanding. Ultimately, we conduct a comprehensive evaluation of RealBench using 21 multimodal LLMs of different sizes, including closed-source models that support multi-image inputs as well as open-source visual and video models. The experimental results indicate that even the most powerful closed-source models still face challenges when handling multi-image Chinese scenarios. Moreover, there remains a noticeable performance gap of around 71.8\% on average between open-source visual/video models and closed-source models. These results show that RealBench provides an important research foundation for further exploring multi-image understanding capabilities in the Chinese context.

information, large language model, machine learning, (20 more...)

2509.17421

Country:

North America > United States (0.93)
Europe (0.68)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)